MusicSarreV3, Main, Exploration, bibRecord, 000358

Extended Language Models for XML Element Retrieval

Identifieur interne : 000358 ( Main/Exploration ); précédent : 000357; suivant : 000359

Extended Language Models for XML Element Retrieval

Auteurs : Rongmei Li [Pays-Bas] ; Theo Van Der Weide [Pays-Bas]

Source :

Lecture Notes in Computer Science [ 0302-9743 ] ; 2011.

RBID : ISTEX:9DB55E3C22A27D13B71FD26B6A7E8CF090DC30D4

English descriptors

Teeft :
- Baseline, Castitle, Context task, Dirichlet priors, Document retrieval, Element retrieval, Full document retrieval, Indri search engine, Inex, Language model, Language models, Magp, Main content, Measure score, Prec, Query, Query model, Query term generation, Ranking function, Relevant characters, Retrieval, Retrieval model, Retrieval tasks, Roman architecture, Simplest language model, Snippet, Snippet retrieval, Weide table, Wikipedia.

Abstract

Abstract: In this paper we describe our participation in the INEX 2010 ad-hoc track. We participated in three retrieval tasks (restricted focused task, relevant-in-context, restricted relevant-in-context) and report our findings based on a single set of measure for all tasks. In this year’s participation, we evaluate the performance of the standard language model that is more focused on a fixed number of relevant characters than on relevant paragraphs. Our findings are: 1) the simplest language model for document retrieval performs relatively well in the restricted focused task when using a fixed offset that is close to the average character distance from the beginning of a document to its main content; 2) a good result of document ranking does improve the performance of snippet retrieval; 3) stemming and stopword removal can further boost performance.

Url:

https://api.istex.fr/document/9DB55E3C22A27D13B71FD26B6A7E8CF090DC30D4/fulltext/pdf

DOI: 10.1007/978-3-642-23577-1_8

Affiliations:

Links toward previous steps (curation, corpus...)

to stream Istex, to step Corpus: 001030
to stream Istex, to step Curation: 000F51
to stream Istex, to step Checkpoint: 000219
to stream Main, to step Merge: 000358
to stream Main, to step Curation: 000358

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Extended Language Models for XML Element Retrieval</title>
<author><name sortKey="Li, Rongmei" sort="Li, Rongmei" uniqKey="Li R" first="Rongmei" last="Li">Rongmei Li</name>
</author>
<author><name sortKey="Van Der Weide, Theo" sort="Van Der Weide, Theo" uniqKey="Van Der Weide T" first="Theo" last="Van Der Weide">Theo Van Der Weide</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:9DB55E3C22A27D13B71FD26B6A7E8CF090DC30D4</idno>
<date when="2011" year="2011">2011</date>
<idno type="doi">10.1007/978-3-642-23577-1_8</idno>
<idno type="url">https://api.istex.fr/document/9DB55E3C22A27D13B71FD26B6A7E8CF090DC30D4/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001030</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">001030</idno>
<idno type="wicri:Area/Istex/Curation">000F51</idno>
<idno type="wicri:Area/Istex/Checkpoint">000219</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000219</idno>
<idno type="wicri:doubleKey">0302-9743:2011:Li R:extended:language:models</idno>
<idno type="wicri:Area/Main/Merge">000358</idno>
<idno type="wicri:Area/Main/Curation">000358</idno>
<idno type="wicri:Area/Main/Exploration">000358</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Extended Language Models for XML Element Retrieval</title>
<author><name sortKey="Li, Rongmei" sort="Li, Rongmei" uniqKey="Li R" first="Rongmei" last="Li">Rongmei Li</name>
<affiliation wicri:level="3"><country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea>Radboud University, Nijmegen</wicri:regionArea>
<placeName><settlement type="city">Nimègue</settlement>
<region type="province" nuts="2">Gueldre</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Van Der Weide, Theo" sort="Van Der Weide, Theo" uniqKey="Van Der Weide T" first="Theo" last="Van Der Weide">Theo Van Der Weide</name>
<affiliation wicri:level="3"><country xml:lang="fr">Pays-Bas</country>
<wicri:regionArea>Radboud University, Nijmegen</wicri:regionArea>
<placeName><settlement type="city">Nimègue</settlement>
<region type="province" nuts="2">Gueldre</region>
</placeName>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2011</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="Teeft" xml:lang="en"><term>Baseline</term>
<term>Castitle</term>
<term>Context task</term>
<term>Dirichlet priors</term>
<term>Document retrieval</term>
<term>Element retrieval</term>
<term>Full document retrieval</term>
<term>Indri search engine</term>
<term>Inex</term>
<term>Language model</term>
<term>Language models</term>
<term>Magp</term>
<term>Main content</term>
<term>Measure score</term>
<term>Prec</term>
<term>Query</term>
<term>Query model</term>
<term>Query term generation</term>
<term>Ranking function</term>
<term>Relevant characters</term>
<term>Retrieval</term>
<term>Retrieval model</term>
<term>Retrieval tasks</term>
<term>Roman architecture</term>
<term>Simplest language model</term>
<term>Snippet</term>
<term>Snippet retrieval</term>
<term>Weide table</term>
<term>Wikipedia</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: In this paper we describe our participation in the INEX 2010 ad-hoc track. We participated in three retrieval tasks (restricted focused task, relevant-in-context, restricted relevant-in-context) and report our findings based on a single set of measure for all tasks. In this year’s participation, we evaluate the performance of the standard language model that is more focused on a fixed number of relevant characters than on relevant paragraphs. Our findings are: 1) the simplest language model for document retrieval performs relatively well in the restricted focused task when using a fixed offset that is close to the average character distance from the beginning of a document to its main content; 2) a good result of document ranking does improve the performance of snippet retrieval; 3) stemming and stopword removal can further boost performance.</div>
</front>
</TEI>
<affiliations><list><country><li>Pays-Bas</li>
</country>
<region><li>Gueldre</li>
</region>
<settlement><li>Nimègue</li>
</settlement>
</list>
<tree><country name="Pays-Bas"><region name="Gueldre"><name sortKey="Li, Rongmei" sort="Li, Rongmei" uniqKey="Li R" first="Rongmei" last="Li">Rongmei Li</name>
</region>
<name sortKey="Van Der Weide, Theo" sort="Van Der Weide, Theo" uniqKey="Van Der Weide T" first="Theo" last="Van Der Weide">Theo Van Der Weide</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Wicri/Sarre/explor/MusicSarreV3/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000358 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000358 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Wicri/Sarre
   |area=    MusicSarreV3
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:9DB55E3C22A27D13B71FD26B6A7E8CF090DC30D4
   |texte=   Extended Language Models for XML Element Retrieval
}}

This area was generated with Dilib version V0.6.33.
Data generation: Sun Jul 15 18:16:09 2018. Site generation: Tue Mar 5 19:21:25 2024

	Serveur d'exploration sur la musique en Sarre
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur la musique en Sarre

Extended Language Models for XML Element Retrieval

Extended Language Models for XML Element Retrieval

Source :

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri